Effect of Spam on Hashtag Recommendation for Tweets

نویسندگان

  • Surendra Sedhai
  • Aixin Sun
چکیده

Presence of spam tweets in a datasetmay affect the choices of feature selection, algorithm formulation, and system evaluation for many applications. However, most existing studies have not considered the impact of spam tweets. In this paper, we study the impact of spam tweets on hashtag recommendation for hyperlinked tweets (i.e., tweets containing URLs) in HSpam14 dataset. HSpam14 is a collection of 14 million tweets with annotations of being spam and ham (i.e., non-spam). In our experiments, we observe that it is much easier to recommend “correct” hashtags for spam tweets than ham tweets, because of the near duplicates in spam tweets. Simple approaches like recommendingmost popular hashtags achieves very good accuracy on spam tweets. On the other hand, features that are highly effective on ham tweets may not be effective on spam tweets. Our findings suggest that without removing spam tweets from the data collection (as in most studies), the results obtained could be misleading for hashtag recommendation tasks.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An analysis of 14 Million tweets on hashtag-oriented spamming

Over the years, Twitter has become a popular platform for information dissemination and information gathering. However, the popularity of Twitter has attracted not only legitimate users but also spammers who exploit social graphs, popular keywords, and hashtags for malicious purposes. In this paper, we present a detailed analysis of the HSpam14 dataset, which contains 14 million tweets with spa...

متن کامل

Impact of Feature Selection on Micro-Text Classification

Social media datasets – especially TwiŠer tweets – are popular in the €eld of text classi€cation. Tweets are a valuable source of microtext (sometimes referred to as “micro-blogs”), and have been studied in domains such as sentiment analysis, recommendation systems, spam detection, clustering, among others [6]. Tweets o‰en include keywords referred to as “Hashtags” that can be used as labels fo...

متن کامل

Une méthode collaborative pour identifier les spams: contribution à la qualité de l'information dans les réseaux sociaux

Prevent the actions of malicious users called "spammers" is a real challenge to maintain a high level of performance in applications implemented in social networks. Conventional spam detection methods impose large and unavoidable processing times, for example up to months for processing large collections of tweets. These methods entirely dependent on the supervised learning approach chosen to p...

متن کامل

Personalized Hashtag Suggestion for Microblogs

In microblogging services, users can generate hashtags to categorize their tweets. However, a majority of microblogs do not contain hashtags, which has intrigued active research on the problem of automatic hashtag recommendation for microblogs. Previous work conducted on this problem mostly does not take the user’s preference into consideration. In this paper, we propose a novel personalized ha...

متن کامل

Automatic Hashtag Recommendation in Social Networking and Microblogging Platforms Using a Knowledge-Intensive Content-based Approach

In social networking/microblogging environments, #tag is often used for categorizing messages and marking their key points. Also, since some social networks such as twitter apply restrictions on the number of characters in messages, #tags can serve as a useful tool for helping users express their messages. In this paper, a new knowledge-intensive content-based #tag recommendation system is intr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016